feat(csharp): wire connection-level telemetry into StatementExecutionConnection (PECO-3022)#463
Draft
jadewang-db wants to merge 14 commits into
Draft
feat(csharp): wire connection-level telemetry into StatementExecutionConnection (PECO-3022)#463jadewang-db wants to merge 14 commits into
jadewang-db wants to merge 14 commits into
Conversation
This was referenced May 19, 2026
added 14 commits
May 22, 2026 20:06
…erMode\n\nTask ID: task-1.1-refactor-connection-telemetry-create
- High: preserve fail-open contract — wrap the TSessionHandle Guid byte[] conversion in InitializeTelemetry with try/catch. Pre-refactor the same conversion lived inside ConnectionTelemetry.Create's outer try/catch so a malformed session GUID degraded to NoOp telemetry; moving it to the transport boundary lost that guarantee. - Medium: remove SafeBuildSystemConfiguration_..._FallbackPath — it passed string.Empty for assemblyVersion expecting a throw, but CreateSystemConfiguration coalesces empty string. The catch block is never reached. The CanonicalConstant_HasExpectedLiteralValue test already pins both branches by construction. - Low: rename Create_ThrowingHttpClient_ReturnsNullConnectionTelemetry to ...ReturnsNoOpConnectionTelemetry — actual return is the NoOp singleton. - Low: document the test's implicit assumption that the empty-host throws inside HttpClientFactory/TelemetryClientManager so future defensive handling there doesn't silently turn this into a non-test. - Low: add Create_EmptySessionId_MapsToNullInContext to pin the string.Empty -> null SessionId mapping at ConnectionTelemetry.cs:133. Co-authored-by: Isaac
…on\n\nTask ID: task-2.1-observer-interface-and-null-impl
…ryContext + enqueue\n\nTask ID: task-2.2-telemetry-observer-impl
…server\n\nTask ID: task-2.3-safe-observer-decorator
…\n\nTask ID: task-3.1-refactor-databricks-statement-observer
…elemetryContext downcast
…D: task-4.1-wire-sea-connection-telemetry
…t _waitTimeoutSeconds) in SEA
25b5e6e to
44b2001
Compare
CurtHagenlocher
pushed a commit
to CurtHagenlocher/databricks
that referenced
this pull request
May 24, 2026
) ## 🥞 Stacked PR Use this [link](https://github.com/adbc-drivers/databricks/pull/455/files) to review incremental changes. - [**stack/PECO-3022-sea-telemetry-design**](adbc-drivers#455) [[Files changed](https://github.com/adbc-drivers/databricks/pull/455/files)] - [stack/pr-phase1-connection-telemetry-create-refactor](adbc-drivers#460) [[Files changed](https://github.com/adbc-drivers/databricks/pull/460/files/c0dfbed80fd09c91b9e2e5ed8050a268435618bd..d3190aeb6f2f1c727b359d0ef40d26be2280c73e)] - [stack/pr-phase2-observer-interface](adbc-drivers#461) [[Files changed](https://github.com/adbc-drivers/databricks/pull/461/files/b2340d9d32da68d5b81756e0a77008f05aadd45b..b0d9f02236bf7f99e93132cfe6ed4dd119fc1e73)] - [stack/pr-phase3-databricks-statement-observer](adbc-drivers#462) [[Files changed](https://github.com/adbc-drivers/databricks/pull/462/files/b0d9f02236bf7f99e93132cfe6ed4dd119fc1e73..fa799fcd0cc2209982eae2890e16e26854a0649e)] - [stack/pr-phase4-sea-connection-telemetry](adbc-drivers#463) [[Files changed](https://github.com/adbc-drivers/databricks/pull/463/files/f0e344aa4d9d341302ee9b3a5217a6794d8ba189..25b5e6eaf2f2f04941d07bbbe582845254630950)] - [stack/pr-phase5-sea-statement-telemetry](adbc-drivers#464) [[Files changed](https://github.com/adbc-drivers/databricks/pull/464/files/9dc278a249433000b03e2d3a4cdf7daa151caa69..8109047005a89804f243beb3a6c689f43b539506)] --------- ## Summary of the design This design closes the gap where the SEA (REST) transport in the C# ADBC Databricks driver emits **zero** client telemetry — no session events, no per-statement operation events, no error events, no chunk metrics. SEA traffic is currently invisible to the `eng_lumberjack` driver-telemetry pipeline. The design: - Introduces a small **observer interface** (`IStatementOperationObserver`) between statement classes and the telemetry implementation. The interface is shaped around the statement's lifecycle (`OnExecuteStarted`, `OnPollCompleted`, …) rather than telemetry's data model. Contract: fail-open — implementations must not throw. - **Refactors `ConnectionTelemetry.Create`** to accept `string sessionId` (dropping its Thrift `TSessionHandle` coupling) and a `DriverMode mode` parameter. Both transports use the same factory. - Wires observer callbacks at the SEA hookpoints in `StatementExecutionConnection` and `StatementExecutionStatement`. - **Reuses all existing telemetry infrastructure as-is**: `TelemetryClient`, `TelemetryClientManager`, `CircuitBreakerTelemetryExporter`, `DatabricksTelemetryExporter`, `FeatureFlagCache`, `TelemetrySessionContext`, `StatementTelemetryContext`. ## Key decisions and alternatives considered - **Observer interface over static helper or instance emitter.** The gap-fix doc proposed a static `TelemetryHelper`. This design supersedes that with an observer-shaped interface because: (a) it gives one-line callsites with no parameter threading, (b) it decouples statement classes from telemetry types so future tracing/audit observers can plug in without touching statement code, (c) it's trivially mockable for tests. The fail-open contract pushes all try/catch into the implementations exactly once — callsites stay clean. - **Composition, not inheritance.** Considered (and rejected) three inheritance variants: SEA-inherits-Thrift (drags in entire HiveServer2 chain — semantically wrong), shared base above both (blocked structurally — `HiveServer2Connection`'s parent is in the Apache package we don't own), and interface + extension methods (functionally identical to the static helper). C# single inheritance + unowned Apache base classes make pure inheritance impractical. - **Decorator at AdbcConnection boundary rejected.** A wrapper around the whole connection cannot see deep telemetry signals (chunk timing, poll count, first-batch latency) — those live inside statement classes. Wrong granularity. - **Refactor `Create` signature rather than add overload.** Changes the canonical `ConnectionTelemetry.Create` to take `string sessionId`. Thrift converts at the call boundary (`sessionHandle.SessionId.Guid.ToString()`). Eliminates the Thrift leak from the telemetry API permanently. - **Scoped strictly to SEA, plus `DriverMode.Sea` setter.** The cross-cutting Thrift gaps (workspace_id, auth_type, metadata-ops instrumentation, retry_count) are owned by the separate gap-fix workstream and will land independently. The only cross-cutting change pulled in here is unhooking the hardcoded `DriverMode.Types.Type.Thrift` in `BuildDriverConnectionParams` — without it SEA records would be silently mislabeled as Thrift. ## Areas needing specific review focus 1. **Observer interface shape & fail-open contract** (`§5.1`) — the observer methods, naming, and the requirement that implementations never throw. Specifically: are the 8 methods the right cut, or is finer granularity (e.g. split `OnChunksDownloaded` from `OnConsumed`) preferred? 2. **`ConnectionTelemetry.Create` signature change** (`§5.2`) — replaces `TSessionHandle?` with `string sessionId` and adds a `DriverMode mode` parameter. This touches a stable API used by the existing Thrift path; the Thrift call site must convert at the boundary. 3. **Result-format mapping for SEA** (`§8`) — SEA does not expose a typed `ResultFormat`. The mapping table from wire `disposition` × manifest state → proto `ExecutionResult.Format` is a judgment call; please review. 4. **Chunk-metrics dependency on gap-fix** (`§9`, `§16`) — this design assumes the `CloudFetchDownloader → ChunkMetrics → CloudFetchReader.GetChunkMetrics()` plumbing from the gap-fix workstream lands first or concurrently. If gap-fix is delayed, SEA ships with `ChunkMetrics.Empty` and backfills later. Is that acceptable, or should we sequence differently? 5. **Open questions** (`§17`): polling-granularity semantics for `PollCount`, `is_internal_call` semantics for SEA `USE SCHEMA`, and whether SEA's `operation_type` should always be `EXECUTE_STATEMENT_ASYNC` or map to sync-emulated variants. ## Related - Builds on the architecture in [`csharp/doc/telemetry-design.md`](../blob/stack/PECO-3022-sea-telemetry-design/csharp/doc/telemetry-design.md) - Supersedes the `TelemetryHelper` static-helper proposal in [`docs/designs/fix-telemetry-gaps-design.md`](../blob/stack/PECO-3022-sea-telemetry-design/docs/designs/fix-telemetry-gaps-design.md) for the new SEA code; Thrift-side gap-fix work continues independently - Jira: [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) This pull request and its description were written by Isaac. --------- Co-authored-by: Jade Wang <jade.wang+data@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
Summary
Wires
IConnectionTelemetryintoStatementExecutionConnectionso the SEA path emits the same connection-lifecycle events as Thrift (CREATE_SESSION,DELETE_SESSION), and aligns the connection-parameter fields users care about for cross-protocol comparator dashboards._telemetry: IConnectionTelemetryfield — defaults toNoOpConnectionTelemetry.Instance, replaced afterCreateSessionAsyncsucceeds. CallsConnectionTelemetry.Create(..., mode: DriverMode.Sea, ...)and emitsEmitCreateSessionTelemetryon session open /EmitDeleteSessionTelemetryon dispose. 5-second hard timeout onDisposeAsyncso a stuck exporter cannot hang connection close. Test seamTelemetryForTestingmirrors the Thrift pattern.socket_timeout(D1 + B11) — sourced from_httpClient.Timeout.TotalMillisecondsrather than the SEA query-wait timeout_waitTimeoutSeconds. In addition,CreateHttpClientnow honors a user-suppliedSparkParameters.ConnectTimeoutMillisecondswhen explicitly set, falling back toCloudFetchTimeoutMinutesotherwise. End result: the telemetry record'ssocket_timeoutreflects the actual HTTP timeout, matching Thrift exactly under default settings (900s).enable_direct_results(B10) — read from the user-suppliedDatabricksParameters.EnableDirectResultsproperty and emitted faithfully in telemetry, rather than the previous hardcodedtrue. SEA does not implement Thrift's direct-results optimization internally, but reflecting user intent keeps the wire record honest.Files touched
csharp/src/StatementExecution/StatementExecutionConnection.cs— telemetry field, OpenAsync/Dispose wiring, CreateHttpClient timeout source, observer construction inCreateStatement.csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs(new) — exercises CREATE_SESSION/DELETE_SESSION emission, dispose timeout, fail-open initialization.Test plan
OpenAsyncsucceeds even if telemetry initialization throws (fail-open).Disposecompletes within 5 seconds even when the exporter is wedged (Dispose_FlushHangs_CompletesWithin5Seconds).CreateStatementfrom_telemetry.Session; falls back toNullObserver.Instancewhen telemetry is disabled.Related
ConnectionTelemetry.Create) and feat(csharp): add IStatementOperationObserver and fail-open implementations (PECO-3022) #461 (creates aTelemetryObserverper-statement).Part of PECO-3022.